RedwineQuality by Pasit Nusso

##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"

Univariate Plots Section

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity   : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity: num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid     : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ sugar           : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides       : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.SO2        : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.SO2       : num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density         : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH              : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates       : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol         : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality         : int  5 5 5 6 5 5 5 7 7 5 ...
##  fixed.acidity   volatile.acidity  citric.acid        sugar       
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides          free.SO2       total.SO2         density      
##  Min.   :0.01200   Min.   : 1.00   Min.   :  6.00   Min.   :0.9901  
##  1st Qu.:0.07000   1st Qu.: 7.00   1st Qu.: 22.00   1st Qu.:0.9956  
##  Median :0.07900   Median :14.00   Median : 38.00   Median :0.9968  
##  Mean   :0.08747   Mean   :15.87   Mean   : 46.47   Mean   :0.9967  
##  3rd Qu.:0.09000   3rd Qu.:21.00   3rd Qu.: 62.00   3rd Qu.:0.9978  
##  Max.   :0.61100   Max.   :72.00   Max.   :289.00   Max.   :1.0037  
##        pH          sulphates         alcohol         quality     
##  Min.   :2.740   Min.   :0.3300   Min.   : 8.40   Min.   :3.000  
##  1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50   1st Qu.:5.000  
##  Median :3.310   Median :0.6200   Median :10.20   Median :6.000  
##  Mean   :3.311   Mean   :0.6581   Mean   :10.42   Mean   :5.636  
##  3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :4.010   Max.   :2.0000   Max.   :14.90   Max.   :8.000

Most wine’s quality is 6 and range is 3 to 8. The mean of alcohol is 10.42.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90

The volatile.acidity distribution is normal. The median is 7.9.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

The volatile.acidity distribution is bimodal with the volatile.acidity peaking at 0.4, 0.5 and 0.6.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000
## feature
##    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.12 0.13 0.14 
##  132   33   50   30   29   20   24   22   33   30   35   15   27   18   21 
## 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 
##   19    9   16   22   21   25   33   27   25   51   27   38   20   19   21 
##  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39  0.4 0.41 0.42 0.43 0.44 
##   30   30   32   25   24   13   20   19   14   28   29   16   29   15   23 
## 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 
##   22   19   18   23   68   20   13   17   14   13   12    8    9    9    8 
##  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69  0.7 0.71 0.72 0.73 0.74 
##    9    2    1   10    9    7   14    2   11    4    2    1    1    3    4 
## 0.75 0.76 0.78 0.79    1 
##    1    3    1    1    1

The distribution for citric acid appears bimodal with the peaking at 0, 0.24, 0.49.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500
## feature
##  0.9  1.2  1.3  1.4  1.5  1.6 1.65  1.7 1.75  1.8  1.9    2 2.05  2.1 2.15 
##    2    8    5   35   30   58    2   76    2  129  117  156    2  128    2 
##  2.2 2.25  2.3 2.35  2.4  2.5 2.55  2.6 2.65  2.7  2.8 2.85  2.9 2.95    3 
##  131    1  109    1   86   84    1   79    1   39   49    1   24    1   25 
##  3.1  3.2  3.3  3.4 3.45  3.5  3.6 3.65  3.7 3.75  3.8  3.9    4  4.1  4.2 
##    7   15   11   15    1    2    8    1    4    1    8    6   11    6    5 
## 4.25  4.3  4.4  4.5  4.6 4.65  4.7  4.8    5  5.1 5.15  5.2  5.4  5.5  5.6 
##    1    8    4    4    6    2    1    3    1    5    1    3    1    8    6 
##  5.7  5.8  5.9    6  6.1  6.2  6.3  6.4 6.55  6.6  6.7    7  7.2  7.3  7.5 
##    1    4    3    4    4    3    2    3    2    2    2    1    1    1    1 
##  7.8  7.9  8.1  8.3  8.6  8.8  8.9    9 10.7   11 12.9 13.4 13.8 13.9 15.4 
##    2    3    2    3    1    2    1    1    1    2    1    1    2    1    2 
## 15.5 
##    1

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500
## feature
##  0.9  1.2  1.3  1.4  1.5  1.6 1.65  1.7 1.75  1.8  1.9    2 2.05  2.1 2.15 
##    2    8    5   35   30   58    2   76    2  129  117  156    2  128    2 
##  2.2 2.25  2.3 2.35  2.4  2.5 2.55  2.6 2.65  2.7  2.8 2.85  2.9 2.95    3 
##  131    1  109    1   86   84    1   79    1   39   49    1   24    1   25 
##  3.1  3.2  3.3  3.4 3.45  3.5  3.6 3.65  3.7 3.75  3.8  3.9    4  4.1  4.2 
##    7   15   11   15    1    2    8    1    4    1    8    6   11    6    5 
## 4.25  4.3  4.4  4.5  4.6 4.65  4.7  4.8    5  5.1 5.15  5.2  5.4  5.5  5.6 
##    1    8    4    4    6    2    1    3    1    5    1    3    1    8    6 
##  5.7  5.8  5.9    6  6.1  6.2  6.3  6.4 6.55  6.6  6.7    7  7.2  7.3  7.5 
##    1    4    3    4    4    3    2    3    2    2    2    1    1    1    1 
##  7.8  7.9  8.1  8.3  8.6  8.8  8.9    9 10.7   11 12.9 13.4 13.8 13.9 15.4 
##    2    3    2    3    1    2    1    1    1    2    1    1    2    1    2 
## 15.5 
##    1
## 90% 
## 3.6

Transform the long tail data to better understand the distribution of sugar The distribution for sugar appears to be right skewed. Most of them (90%) sugar less than 3.6 (4.5 g / cm^3 are considered sweet).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
##    95% 
## 0.1261

Transform the long tail data to better understand the distribution of chlorides The distribution for chlorides appears to be right skewed. Most of them (95%) chlorides less than 0.1261 .

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00
## 95% 
##  35

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00
## 95% 
##  35

Transform the long tail data to better understand the distribution of free.SO2 The free.SO2 distribution is bimodal with the free.SO2 peaking at 7 and 17.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00
##   95% 
## 112.1

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00
##   95% 
## 112.1

Transform the long tail data to better understand the distribution of total.SO2 The total.SO2 distribution is normal.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010
## feature
## 2.74 2.86 2.87 2.88 2.89  2.9 2.92 2.93 2.94 2.95 2.98 2.99    3 3.01 3.02 
##    1    1    1    2    4    1    4    3    4    1    5    2    6    5    8 
## 3.03 3.04 3.05 3.06 3.07 3.08 3.09  3.1 3.11 3.12 3.13 3.14 3.15 3.16 3.17 
##    6   10    8   10   11   11   11   19    9   20   13   21   34   36   27 
## 3.18 3.19  3.2 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29  3.3 3.31 3.32 
##   30   25   39   36   39   32   29   26   53   35   42   46   57   39   45 
## 3.33 3.34 3.35 3.36 3.37 3.38 3.39  3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47 
##   37   43   39   56   37   48   48   37   34   33   17   29   20   22   21 
## 3.48 3.49  3.5 3.51 3.52 3.53 3.54 3.55 3.56 3.57 3.58 3.59  3.6 3.61 3.62 
##   19   10   14   15   18   17   16    8   11   10   10    8    7    8    4 
## 3.63 3.66 3.67 3.68 3.69  3.7 3.71 3.72 3.74 3.75 3.78 3.85  3.9 4.01 
##    3    4    3    5    4    1    4    3    1    1    2    1    2    2

The pH distribution is normal.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0040

The distribution for density acid appears to be normal and the different between min and max is only 0.014. ( different between alcohol and water is 0.22)

Ref : https://en.wikipedia.org/wiki/Ethanol

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000
##  95% 
## 0.93

Transform the long tail data to better understand the distribution of sulphates. The distribution for sulphates appears to be normal. Most of them (95%) sulphates less than 0.93.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

The distribution for alcohol appears to be right skewed.

## feature
##   3   4   5   6   7   8 
##  10  53 681 638 199  18
## [1] 0.9493433

Most of data’s wine qulity is between 5 to 7 (94.9 %). I think I will covert this feature to factor for Multivariate Analysis.

Univariate Analysis

What is the structure of your dataset?

ANS : There are 1599 wine in the data set with 12 features.

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity   : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity: num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid     : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ sugar           : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides       : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.SO2        : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.SO2       : num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density         : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH              : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates       : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol         : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality         : int  5 5 5 6 5 5 5 7 7 5 ...
##  fixed.acidity   volatile.acidity  citric.acid        sugar       
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides          free.SO2       total.SO2         density      
##  Min.   :0.01200   Min.   : 1.00   Min.   :  6.00   Min.   :0.9901  
##  1st Qu.:0.07000   1st Qu.: 7.00   1st Qu.: 22.00   1st Qu.:0.9956  
##  Median :0.07900   Median :14.00   Median : 38.00   Median :0.9968  
##  Mean   :0.08747   Mean   :15.87   Mean   : 46.47   Mean   :0.9967  
##  3rd Qu.:0.09000   3rd Qu.:21.00   3rd Qu.: 62.00   3rd Qu.:0.9978  
##  Max.   :0.61100   Max.   :72.00   Max.   :289.00   Max.   :1.0037  
##        pH          sulphates         alcohol         quality     
##  Min.   :2.740   Min.   :0.3300   Min.   : 8.40   Min.   :3.000  
##  1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50   1st Qu.:5.000  
##  Median :3.310   Median :0.6200   Median :10.20   Median :6.000  
##  Mean   :3.311   Mean   :0.6581   Mean   :10.42   Mean   :5.636  
##  3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :4.010   Max.   :2.0000   Max.   :14.90   Max.   :8.000
##   fixed.acidity volatile.acidity citric.acid sugar chlorides free.SO2
## 1           7.4             0.70        0.00   1.9     0.076       11
## 2           7.8             0.88        0.00   2.6     0.098       25
## 3           7.8             0.76        0.04   2.3     0.092       15
## 4          11.2             0.28        0.56   1.9     0.075       17
## 5           7.4             0.70        0.00   1.9     0.076       11
## 6           7.4             0.66        0.00   1.8     0.075       13
##   total.SO2 density   pH sulphates alcohol quality
## 1        34  0.9978 3.51      0.56     9.4       5
## 2        67  0.9968 3.20      0.68     9.8       5
## 3        54  0.9970 3.26      0.65     9.8       5
## 4        60  0.9980 3.16      0.58     9.8       6
## 5        34  0.9978 3.51      0.56     9.4       5
## 6        40  0.9978 3.51      0.56     9.4       5

Input variables (based on physicochemical tests): 1. - fixed acidity (tartaric acid - g / dm^3): most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. - volatile acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. - citric acid (g / dm^3): found in small quantities, citric acid can add ‘freshness’ and flavor to wines 4. - residual sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. - chlorides (sodium chloride - g / dm^3): the amount of salt in the wine 6. - free sulfur dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. - total sulfur dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. - density (g / cm^3) 9. - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. - sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 11. - alcohol (% by volume): the percent alcohol content of the wine

Output variable (based on sensory data): 12. - quality (score between 0 and 10)

What is/are the main feature(s) of interest in your dataset?

ANS: The main feature of interest is wine’s quality. I would like to investigate which variable(s) effect the wine quality.

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest? ANS: I think smell taste touch and addictive content that will effect the wine’s quality so the features that I choose for investigation is :

  1. sulphates
  2. volatile acidity
  3. citric acidity
  4. chlorides
  5. sum of acid (tartaric acid + citric acid ) as “sourness”
  6. alcohol
##   fixed.acidity volatile.acidity citric.acid sugar chlorides free.SO2
## 1           7.4             0.70        0.00   1.9     0.076       11
## 2           7.8             0.88        0.00   2.6     0.098       25
## 3           7.8             0.76        0.04   2.3     0.092       15
## 4          11.2             0.28        0.56   1.9     0.075       17
## 5           7.4             0.70        0.00   1.9     0.076       11
## 6           7.4             0.66        0.00   1.8     0.075       13
##   total.SO2 density   pH sulphates alcohol quality sourness
## 1        34  0.9978 3.51      0.56     9.4       5   5.1800
## 2        67  0.9968 3.20      0.68     9.8       5   5.4600
## 3        54  0.9970 3.26      0.65     9.8       5   5.4784
## 4        60  0.9980 3.16      0.58     9.8       6   8.0976
## 5        34  0.9978 3.51      0.56     9.4       5   5.1800
## 6        40  0.9978 3.51      0.56     9.4       5   5.1800

Did you create any new variables from existing variables in the dataset?

Yes, I create “sourness” from fixed.acidity and citric.acid that represent the sourness of wine.

Of the features you investigated, were there any unusual distributions? Did

you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

ANS: The distribution for citric acid, volatile.acidity and free.SO2 appears bimodal and I tidies the data by remove X feature that I am not interested and transform fixed.acidity and citric.acid to sourness for next investigation.

Bivariate Plots Section

Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4313210 -0.3482032
## sample estimates:
##        cor 
## -0.3905578

WineQuality.vs.citric.acid

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1793415 0.2723711
## sample estimates:
##       cor 
## 0.2263725

WineQuality.vs.sulphates

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049011 0.2967610
## sample estimates:
##       cor 
## 0.2513971

WineQuality.vs.alcohol

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features

in the dataset?

ANS: From the plots and correlation values sulphates, citric acid acidity, alcohol positively relate with quality but volatile acidity negatively relate with quality.

Alcohol sulphates and volatile acidity ’s plot show the different between 3 wine rating of wine very well but citric acid show the different between normal and good wine poorly.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

## 
##  Pearson's product-moment correlation
## 
## data:  featureX and featureY
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5856550 -0.5174902
## sample estimates:
##        cor 
## -0.5524957

## 
##  Pearson's product-moment correlation
## 
## data:  featureX and featureY
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2678558 0.3563278
## sample estimates:
##     cor 
## 0.31277

## 
##  Pearson's product-moment correlation
## 
## data:  featureX and featureY
## t = 4.4188, df = 1597, p-value = 1.059e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.06121189 0.15807276
## sample estimates:
##       cor 
## 0.1099032

ANS: I found that citric acid and volatile acidity very correlate.

High Corelation:

citric acid and volatile acidity : -0.5524957
citric acid and sulphates acidity : 0.31277

Low Corelation:

citric acid and alcohol acidity : 0.1099032

What was the strongest relationship you found?

ANS: For feature of interest alcohol percentage has highest corelation value. (0.476)
For every pair of features free.SO2 and total.SO2 has highest corelation value. (0.66

Multivariate Plots Section

First I need to prepare alcohol.level for multivariate plot.

From the plot show that the excellent wine mostly stay on the top left, good wine stay in the middle and normal wine stay in the bottom right.

Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that :

excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.

Wine rating.vs.alcohol.level.vs.citric.acid plot shows that the excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.4 is very high.

Wine rating.vs.alcohol.level.vs.total.SO2 plot shows that the excellent wine ratio in alcohol grade “medium” on total.SO2 at 5-30 is very high.

Wine rating.vs.alcohol.level.vs.sulphates plot shows that the excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.

Pattern is not noticable here.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

From the plots , show that alcohol feature is the highest impact feature.

Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
Wine rating.vs.alcohol.level.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.35-0.5 is very high.
Wine rating.vs.alcohol.level.vs.total.SO2 plot shows that excellent wine ratio in alcohol grade “medium” on total.SO2 at 5-30 is very high.
Wine rating.vs.alcohol.level.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Win rating.vs.alcohol.level.vs.sourness.vs.chlorides shows that there hardly to determine wine quality by tongue (chorides and sourness).

Were there any interesting or surprising interactions between features?

It is very surprise that smell(total.SO2) has influnce over the wine rating but taste(chorides and sourness) has not.


Final Plots and Summary

Plot One

Alcohol Distribution

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

Description One

The distribution for alcohol appears right skewed with the median at 10.2 % and 90% of red wines data have alcohol between 9.2% to 12.5%, perhaps due to the demand of red wines and buyers purshasing make the plot look like this.

Plot Two

How does each features influence the wine rating ?

Description Two

Redwines with higher quality score (best of this group is 8) tend to have lower volatile acidity and lowest quality score have the highest median volatile acidity. Volatile acidity variance decreases as the higher quality score. ( worst of this group is 3 ).

Plot Three

What makes wine excellent ?

Description Three

From the plot, the excellent quality redwines have highest median potassium sulphase and the proportion of excellent redwines are greater in medium alcohol level compare to the proportion of redwines in other alcohol level.


Reflection

## 'data.frame':    1599 obs. of  15 variables:
##  $ fixed.acidity   : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity: num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid     : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ sugar           : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides       : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.SO2        : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.SO2       : num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density         : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH              : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates       : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol         : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality         : int  5 5 5 6 5 5 5 7 7 5 ...
##  $ sourness        : num  5.18 5.46 5.48 8.1 5.18 ...
##  $ wine_rating     : Ord.factor w/ 3 levels "normal"<"good"<..: 1 1 1 2 1 1 1 3 3 1 ...
##  $ alcohol.level   : Ord.factor w/ 3 levels "low alcohol"<..: 1 1 1 1 1 1 1 2 1 2 ...

The data set contain 1599 wine from 2009. I start by understand the distribution and variables in data set and try to interpret in term of sense that human can percieve. First I found that the distribution of alcohol is right skewed,I believe that the demands of rewine drive this distribution. From bivariate and multivariate analytsis, I found that the taste sourness and salty has no evidence that they has influence over the quality of wine but the smell (total sulfur dioxide), addictive content (alcohol), voilatile acidity, citric acid and sulphates has influence over it. On low alcohol percentage we hardly found excellent wine_rating but mostly is normal and you can find some of good wine rating if they has total SO2 in range 5-60 and sulphates in range 0.53-0.73 ,On medium-low alcohol percentage wine exellent can be found on low volatile acidity and total sulfur dioxide below 55 but mostly are normal and good rating wine,On medium alcohol percentage exellent red wine can be found at high percentage on total sulfur dioxide below 50 and sulphate upper than 0.65 and mostly are exellent and good rating wine. I struggled to visulize multivariate plot to clearly present the relation more than one featue against wine quality at first finally I found out that if I create new varible that represent the feature as group It will be easier,Next I can not clearly present the relation of selected features by geom_point as you can see non of features has strong corelation value but this become much better when I decide to use histogram.
After I research(google) and ask a drinker,my friend, I found out that there are many significant variables that we do not have such as type of grape, where it made from, age of wine. I am confident that if the data had these variables, I could provide more insightful analysis over red wine quality.